Safer implementation of RepeatN #130887

Soveu · 2024-09-26T16:31:38Z

I've seen the "Use MaybeUninit for RepeatN" commit while reading This Week In Rust and immediately thought about something I've written some time ago - https://github.com/Soveu/repeat_finite/blob/master/src/lib.rs.

Using the fact, that Option will find niche in (T, NonZeroUsize), we can construct something that has the same size as (T, usize) while completely getting rid of MaybeUninit.
This leaves only unsafe on TrustedLen, which is pretty neat.

rustbot · 2024-09-26T16:31:46Z

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Soveu · 2024-09-26T17:34:49Z

huh, I'm no llvm expert, buuut...

          8: ; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: readwrite) uwtable 
           9: define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr #0 { 
          10: start: 
          11:  %0 = load i64, ptr %it, align 8, !alias.scope !3, !noundef !6 
          12:  %1 = icmp eq i64 %0, 0 
          13:  br i1 %1, label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit", label %bb8.i 
          14:  
          15: bb8.i: ; preds = %start 
          16:  %_10.0.i = add i64 %0, -1 
          17:  %2 = icmp eq i64 %_10.0.i, 0 
          18:  br i1 %2, label %bb5.i, label %bb3.i 
          19:  
          20: bb3.i: ; preds = %bb8.i 
          21:  store i64 %_10.0.i, ptr %it, align 8, !alias.scope !3 
  next:29      !~~~~~~~~~~~~~~~~~                                     error: match on wrong line
          22:  %_14.i = getelementptr inbounds i8, ptr %it, i64 8 
          23:  %_14.val.i = load i16, ptr %_14.i, align 8, !alias.scope !3, !noundef !6 
          24:  br label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit" 
          25:  
          26: bb5.i: ; preds = %bb8.i 
          27:  %3 = getelementptr inbounds i8, ptr %it, i64 8 
          28:  %4 = load i16, ptr %3, align 8, !alias.scope !7 
          29:  store i64 0, ptr %it, align 8, !alias.scope !7 
          30:  br label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit" 
          31:  
          32: "_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit": ; preds = %start, %bb3.i, %bb5.i 
          33:  %_0.sroa.5.0.i = phi i16 [ %4, %bb5.i ], [ %_14.val.i, %bb3.i ], [ undef, %start ] 
          34:  %_0.sroa.0.0.i = phi i16 [ 1, %bb5.i ], [ 1, %bb3.i ], [ 0, %start ] 
          35:  %5 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0.i, 0 
          36:  %6 = insertvalue { i16, i16 } %5, i16 %_0.sroa.5.0.i, 1 
          37:  ret { i16, i16 } %6 
          38: }

bb3.i and bb5.i if not for the store i64 they would look identical.

The other functions (vec_extend_via_repeat_n and array_repeat_not_copy) look ok.

slanterns · 2024-10-15T12:59:47Z

Maybe you can ping scottmcm (test author) to see if the change in codegen result is acceptable (then adjust the test.)

Dylan-DPC · 2024-11-28T15:16:35Z

@Soveu any updates on the failed test? thanks

Soveu · 2024-11-29T17:37:05Z

@Soveu any updates on the failed test? thanks

I guess we don't want to introduce additional branches in the generated code, so for "production" code this implementation would be unfit. @scottmcm could comment on that.

scottmcm · 2024-11-29T21:08:08Z

It is pretty neat, and in fact it's the implementation I originally wrote for this.

The biggest issue I had with it then, though, is that it makes the iterator be potentially-undef, but because of the #130141 bug (which reminds me to go push on rust-lang/rfcs#3712 ) we're stuck with that problem anyway, so there's probably a way you could make this work.

Looks like the problem is that it's having trouble figuring out that the store of zero and the store of count-1 can be merged, because otherwise there'd be no difference.

Basically, what this test is testing is that for !needs_drop types we don't need to consider the last iteration separately, because the difference between them is whether we need to call drop, but since we don't need to do that for !needs_drop it's not an issue, and thus we should be able to avoid that second branch. (We of course need the "is the iterator empty?" branch, but once we know it's not empty there should be no further branches needed for !needs_drop types.)

Nothing immediately jumps to mind for how to get LLVM to know that, though. Maybe try a separate if const { mem::needs_drop::<T>() } path that could avoid the take, and see if that helps? Or maybe there's a way to phrase this such that you compute the value to write in a local, then there's just one write of it back to self so LLVM just needs to merge the values instead of merging the stores?

I'd suggest copying the implementation to godbolt and playing around a bit there until you find an incantation that works.

Soveu · 2024-11-29T21:47:15Z

I have tried some things when first writing this code, with no effect, but the idea with mem::needs_drop made me try again. I was able to make the branches very similar, making LLVM merge them :) Technically there is still a case, where clone is more expensive than move, but the type does not have drop(). Probably it doesn't happen often, but might need consideration.

    fn next(&mut self) -> Option<A> {
        let inner = self.inner.as_mut()?;
        let count = inner.count.get();

        if let Some(new_count) = NonZero::<usize>::new(count - 1) {
            let tmp = inner.element.clone();
            inner.count = new_count;
            return Some(tmp);
        }

        return if core::mem::needs_drop::<A>() {
            self.take_element()
        } else {
            let tmp = inner.element.clone();
            self.inner = None;
            Some(tmp)
        };
    }

Godbolt: https://rust.godbolt.org/z/Kr198x635

LLVM output:

define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr personality ptr @rust_eh_personality {
start:
  %0 = getelementptr inbounds i8, ptr %it, i64 8
  %1 = load i64, ptr %0, align 8
  %2 = icmp eq i64 %1, 0
  br i1 %2, label %"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit", label %bb13.i

bb13.i:
  %count.i = add i64 %1, -1
  %3 = load i16, ptr %it, align 8
  store i64 %count.i, ptr %0, align 8
  br label %"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit"

"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit":
  %_0.sroa.6.0.i = phi i16 [ undef, %start ], [ %3, %bb13.i ]
  %_0.sroa.0.0.i = phi i16 [ 0, %start ], [ 1, %bb13.i ]
  %4 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0.i, 0
  %5 = insertvalue { i16, i16 } %4, i16 %_0.sroa.6.0.i, 1
  ret { i16, i16 } %5
}

declare noundef range(i32 0, 10) i32 @rust_eh_personality(i32 noundef, i32 noundef range(i32 1, 17), i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

LLVM output for the current std implementation:

define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr {
start:
  %_3 = load i64, ptr %it, align 8
  %_2.not = icmp eq i64 %_3, 0
  br i1 %_2.not, label %bb4, label %bb1

bb1:
  %_2.i = add i64 %_3, -1
  store i64 %_2.i, ptr %it, align 8
  %0 = getelementptr inbounds i8, ptr %it, i64 8
  %result.i = load i16, ptr %0, align 8
  br label %bb4

bb4:
  %_0.sroa.3.0 = phi i16 [ %result.i, %bb1 ], [ undef, %start ]
  %_0.sroa.0.0 = phi i16 [ 1, %bb1 ], [ 0, %start ]
  %1 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0, 0
  %2 = insertvalue { i16, i16 } %1, i16 %_0.sroa.3.0, 1
  ret { i16, i16 } %2
}

From what I see, %0 = getelementptr inbounds i8, ptr %it, i64 8 gets moved to the top

Soveu · 2024-11-29T21:59:44Z

That is interesting, just switching variables made it work 🤔

    fn next(&mut self) -> Option<A> {
        let inner = self.inner.as_mut()?;
        let count = inner.count.get();

        if let Some(new_count) = NonZero::<usize>::new(count - 1) {
            // This must be in this order!
            let tmp = inner.element.clone();
            inner.count = new_count;
            return Some(tmp);
        }

        return self.take_element();
    }

scottmcm · 2024-11-29T22:01:53Z

just switching variables made it work 🤔

Definitely odd, but that's why we have codegen tests :P

scottmcm · 2024-11-29T22:35:19Z

From what I see, %0 = getelementptr inbounds i8, ptr %it, i64 8 gets moved to the top

What that means is that it ended up getting a different layout -- originally it was storing the count at address, but apparently it flipped the order.

I don't have a strong feeling about which order is better, but if you want you could put #[repr(C)] // not stable just keeps the layout consistent for codegen tests on RepeatNInner to make it give you the order you want.

library/core/src/iter/sources/repeat_n.rs

…epeatn, r=<try> Use `iter::repeat_n` to implement `Vec::extend_with` This replaces the `Vec::extend_with` manual implementation, which is used by `Vec::resize` and `Vec` `SpecFromElem`, with `iter::repeat_n`. I've compared the codegen output between: 1. the current `Vec::resize` impl 2. this branch 3. this branch + rust-lang#130887 3 gives the closest codegen output to 1, with some output improvements. 2 doesn't look good: https://rust.godbolt.org/z/Yrc83EhjY. May also help rust-lang#120050?

Soveu · 2024-11-30T23:12:34Z

@scottmcm @Amanieu its ready

Soveu · 2024-11-30T23:15:33Z

@rustbot label +S-waiting-on-review -S-waiting-on-author

…epeatn, r=<try> Use `iter::repeat_n` to implement `Vec::extend_with` This replaces the `Vec::extend_with` manual implementation, which is used by `Vec::resize` and `Vec` `SpecFromElem`, with `iter::repeat_n`. I've compared the codegen output between: 1. the current `Vec::resize` impl 2. this branch 3. this branch + rust-lang#130887 3 gives the closest codegen output to 1, with some output improvements. 2 doesn't look good: https://rust.godbolt.org/z/Yrc83EhjY. May also help rust-lang#120050? --- WARNING: DO NOT MERGE - in order to run the perf run in rust-lang#133662 (comment) this PR currently also contains commits from rust-lang#130887

paolobarbolini · 2024-12-01T11:38:50Z

FYI we're also playing around with this change at #133662 since Compiler Explorer showed that the two MRs combined made Vec::resize slightly better. We'll see how the perf run goes.

Folyd · 2024-12-01T17:12:31Z

Nice, this reminds me of my article on Rust Magazine: VecDeque::resize() optimization
.

paolobarbolini · 2025-03-22T13:38:09Z

Any news? Also #138833 may create a merge conflict with this.

Amanieu · 2025-05-03T19:42:56Z

r? @scottmcm

scottmcm · 2025-05-16T02:37:25Z

library/core/src/iter/sources/repeat_n.rs

+}
+
+#[derive(Clone, Copy)]
+#[repr(C)] // keep the layout consistent for codegen tests


Sorry, I think I have you bad advice here in the past.

Rather than a repr here, I've since learned that you should instead add

//@ needs-deterministic-layouts

to any of the codegen tests that break under layout randomization.

scottmcm

This looks basically good to me, though I did see one thing that I put above.

Can you please rebase to resolve the merge conflict, then mark this ready again?

@rustbot author

rustbot · 2025-05-16T02:41:20Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

rustbot assigned Amanieu Sep 26, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 26, 2024

This comment has been minimized.

Sign in to view

Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 28, 2024

Soveu force-pushed the repeatn branch from 464f771 to ff05378 Compare November 29, 2024 22:08

This comment has been minimized.

Sign in to view

scottmcm reviewed Nov 29, 2024

View reviewed changes

library/core/src/iter/sources/repeat_n.rs Outdated Show resolved Hide resolved

paolobarbolini mentioned this pull request Nov 30, 2024

Use iter::repeat_n to implement Vec::extend_with #133662

Open

100% safe implementation of RepeatN

496db67

Soveu force-pushed the repeatn branch from ff05378 to 496db67 Compare November 30, 2024 21:36

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 30, 2024

paolobarbolini mentioned this pull request Dec 2, 2024

Vec::resize for bytes should be a single memset #120050

Draft

tgross35 changed the title ~~100% safe implementation of RepeatN~~ Safer implementation of RepeatN Dec 19, 2024

rustbot assigned scottmcm and unassigned Amanieu May 3, 2025

scottmcm reviewed May 16, 2025

View reviewed changes

scottmcm requested changes May 16, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safer implementation of RepeatN #130887

Safer implementation of RepeatN #130887

Soveu commented Sep 26, 2024

rustbot commented Sep 26, 2024

This comment has been minimized.

Soveu commented Sep 26, 2024

slanterns commented Oct 15, 2024

Dylan-DPC commented Nov 28, 2024

Soveu commented Nov 29, 2024

scottmcm commented Nov 29, 2024

Soveu commented Nov 29, 2024

Soveu commented Nov 29, 2024

scottmcm commented Nov 29, 2024

This comment has been minimized.

scottmcm commented Nov 29, 2024

Soveu commented Nov 30, 2024

Soveu commented Nov 30, 2024

paolobarbolini commented Dec 1, 2024

Folyd commented Dec 1, 2024

paolobarbolini commented Mar 22, 2025

Amanieu commented May 3, 2025

scottmcm May 16, 2025

scottmcm left a comment

rustbot commented May 16, 2025

Safer implementation of RepeatN #130887

Are you sure you want to change the base?

Safer implementation of RepeatN #130887

Conversation

Soveu commented Sep 26, 2024

rustbot commented Sep 26, 2024

This comment has been minimized.

Soveu commented Sep 26, 2024

slanterns commented Oct 15, 2024

Dylan-DPC commented Nov 28, 2024

Soveu commented Nov 29, 2024

scottmcm commented Nov 29, 2024

Soveu commented Nov 29, 2024

Soveu commented Nov 29, 2024

scottmcm commented Nov 29, 2024

This comment has been minimized.

scottmcm commented Nov 29, 2024

Soveu commented Nov 30, 2024

Soveu commented Nov 30, 2024

paolobarbolini commented Dec 1, 2024

Folyd commented Dec 1, 2024

paolobarbolini commented Mar 22, 2025

Amanieu commented May 3, 2025

scottmcm May 16, 2025

Choose a reason for hiding this comment

scottmcm left a comment

Choose a reason for hiding this comment

rustbot commented May 16, 2025